Skip to content

TYP: pandas/core/frame.py #38416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
Mar 15, 2021
Merged

Conversation

arw2019
Copy link
Member

@arw2019 arw2019 commented Dec 11, 2020

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

I plan to do a pass through the whole file - putting it up in case anybody has feedback on what I have

@arw2019 arw2019 changed the title [WIP] TYP: pandas/core/frame.py TYP: pandas/core/frame.py Dec 11, 2020
@jreback
Copy link
Contributor

jreback commented Dec 11, 2020

suggest that you do reasonably sized pieces so this doesn't sit in review for too long - better multiple but smaller prs as this is a huge file

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer you don't mix easy and hard ones. e.g. a pass on bool / strings ones. and Axis / Level. These are easy to review. by mixing lots of things its very hard to give good scrutiny here.

would prefer that.

data,
orient: str = "columns",
dtype: Optional[Dtype] = None,
columns: Optional[List] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe this should be Arraylike here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean columns? The doctring says list (columns is actually column labels so it seems to make sense)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be Optional[List[Label]]

@@ -1440,7 +1456,7 @@ def to_numpy(

return result

def to_dict(self, orient="dict", into=dict):
def to_dict(self, orient: str = "dict", into=dict) -> Union[Dict, List, Mapping]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mapping should be enough here, why did you add List?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with orient="records" we return a list:

pandas/pandas/core/frame.py

Lines 1589 to 1599 in 36c4d5c

elif orient == "records":
columns = self.columns.tolist()
rows = (
dict(zip(columns, row))
for row in self.itertuples(index=False, name=None)
)
return [
into_c((k, maybe_box_datetimelike(v)) for k, v in row.items())
for row in rows
]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can at least get rid of Dict then - no reason to include Dict and Mapping unless I am overlooking something

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right - done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with orient="records" we return a list:

we should probably overload using Literal to avoid Union return types. but not in this pass.

@@ -1161,7 +1168,7 @@ def __len__(self) -> int:
"""
return len(self.index)

def dot(self, other):
def dot(self, other: Union[AnyArrayLike, DataFrame]) -> FrameOrSeriesUnion:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be explict and add Series to these

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -3311,7 +3329,7 @@ def _box_col_values(self, values, loc: int) -> Series:
# ----------------------------------------------------------------------
# Unsorted

def query(self, expr, inplace=False, **kwargs):
def query(self, expr: str, inplace: bool = False, **kwargs) -> Optional[DataFrame]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think this is correct for the output type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is right. From docstring:

DataFrame resulting from the provided query expression or None if ``inplace=True``.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no my point is that the doc-string is wrong. this can be a Series or Scalar. basically this can be anything. I would remove it for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - removed (and will fix the docstring in follow-on, too)

def eval(self, expr, inplace=False, **kwargs):
def eval(
self, expr: str, inplace: bool = False, **kwargs
) -> Optional[Union[AnyArrayLike, Scalar]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the output type here is suspect

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From docstring:

        Returns
        -------
        ndarray, scalar, pandas object, or None
            The result of the evaluation or None if ``inplace=True``.

so does Optional[Union[AnyArrayLike, DataFrame, Scalar]] work?

def apply(self, func, axis=0, raw=False, result_type=None, args=(), **kwds):
def apply(
self,
func: Callable,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we are more specific on this, though it maybe difficult

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking - maybe we could do something like what was done for AggFuncType

@@ -612,7 +612,7 @@ def crosstab(
margins=margins,
margins_name=margins_name,
dropna=dropna,
**kwargs,
**kwargs, # type: ignore[arg-type]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't add type ignores

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay I'll skip pivot in this PR (this is a weird issue with typing **kwargs)

@jreback
Copy link
Contributor

jreback commented Dec 12, 2020

cc @simonjayhawkins @WillAyd

@jreback jreback added the Typing type annotations, mypy/pyright type checking label Dec 12, 2020
@arw2019
Copy link
Member Author

arw2019 commented Dec 13, 2020

would prefer you don't mix easy and hard ones. e.g. a pass on bool / strings ones. and Axis / Level. These are easy to review. by mixing lots of things its very hard to give good scrutiny here.

would prefer that.

For sure. I'll keep the hard ones that I've already attempted here so all the comments are in one place. I'll resubmit easy and follow-ons in separate PRs

@jreback
Copy link
Contributor

jreback commented Dec 14, 2020

needs merge master and will look

@arw2019
Copy link
Member Author

arw2019 commented Dec 14, 2020

mypy green + all comments addressed (or moved out to orthogonal PRs)

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good - just a few things

@@ -1440,7 +1456,7 @@ def to_numpy(

return result

def to_dict(self, orient="dict", into=dict):
def to_dict(self, orient: str = "dict", into=dict) -> Union[Dict, List, Mapping]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can at least get rid of Dict then - no reason to include Dict and Mapping unless I am overlooking something

@@ -7802,7 +7828,7 @@ def apply(
)
return op.get_result()

def applymap(self, func, na_action: Optional[str] = None) -> DataFrame:
def applymap(self, func: Callable, na_action: Optional[str] = None) -> DataFrame:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be typed more strictly than Callable? Subscripting Callable is much more useful to a reader if possible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - @jreback asked for this too. I'll put up a follow-on for this (to avoid expanding the scope of this PR)

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2021

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Feb 6, 2021
@jreback
Copy link
Contributor

jreback commented Feb 11, 2021

@arw2019 if you want to merge master and update (or close and open a new one ok too)

@arw2019
Copy link
Member Author

arw2019 commented Feb 22, 2021

@jreback @simonjayhawkins updated, CI green, comments addressed

@simonjayhawkins
Copy link
Member

@jreback @simonjayhawkins updated, CI green, comments addressed

ok. maybe we should merge this. there has been plenty of discussion here.

@simonjayhawkins
Copy link
Member

@jreback @WillAyd i'm merging later today if no objections.

mypy is green and typing PRs easily become stale or get abandoned.

There are no ignores, casts or asserts added here and no existing annotations changed/deleted.

if any of the typing additions added here are problematic they will be picked up by mypy as more types are added.

@jreback
Copy link
Contributor

jreback commented Mar 15, 2021

sure let's make sure this is rebased first

@simonjayhawkins
Copy link
Member

yep did that locally and ran mypy before I commented. will push and merge on green

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@simonjayhawkins simonjayhawkins merged commit 9c31717 into pandas-dev:master Mar 15, 2021
@simonjayhawkins
Copy link
Member

Thanks @arw2019

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants